12 research outputs found

    A fast algorithm for genome-wide haplotype pattern mining

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying the genetic components of common diseases has long been an important area of research. Recently, genotyping technology has reached the level where it is cost effective to genotype single nucleotide polymorphism (SNP) markers covering the entire genome, in thousands of individuals, and analyse such data for markers associated with a diseases. The statistical power to detect association, however, is limited when markers are analysed one at a time. This can be alleviated by considering multiple markers simultaneously. The <it>Haplotype Pattern Mining </it>(HPM) method is a machine learning approach to do exactly this.</p> <p>Results</p> <p>We present a new, faster algorithm for the HPM method. The new approach use patterns of haplotype diversity in the genome: locally in the genome, the number of observed haplotypes is much smaller than the total number of possible haplotypes. We show that the new approach speeds up the HPM method with a factor of 2 on a genome-wide dataset with 5009 individuals typed in 491208 markers using default parameters and more if the pattern length is increased.</p> <p>Conclusion</p> <p>The new algorithm speeds up the HPM method and we show that it is feasible to apply HPM to whole genome association mapping with thousands of individuals and hundreds of thousands of markers.</p

    Whole genome association mapping by incompatibilities and local perfect phylogenies

    Get PDF
    BACKGROUND: With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. RESULTS: We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. CONCLUSION: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours

    Sequential analysis of hair mercury levels in relation to fish diet of an Amazonian population, Brazil

    No full text
    Several studies in the Amazonian Basin have shown that riverine populations are exposed to methylmercury through fish consumption. It has been suggested that seasonal variations in hair mercury observed through sequential analyses may be related to the changes in fish species ingested by the local communities. The aim of the present study was to investigate the relationship between fish-eating practices and seasonal variation in mercury exposure. A group of 36 women from a village located on the banks of the Tapajos River, a major tributary of the Amazon, comprised the present study population. An interview-administered questionnaire was used to gather information on socio-demographic characteristics, fish-eating practices and other relevant information. The women also provided hair samples of at least 24 cm in length for mercury analysis. Hair total and inorganic mercury concentration was measured using a cold vapor atomic absorption analytical method. Trigonometric regression analysis was done to assess the seasonal variation of total mercury levels. Variations in inorganic mercury were examined by repeated measures analysis of variance, and analysis of contrast variable with a polynomial transformation. The results showed that hair mercury levels varied with the season. Higher levels were observed in months corresponding to the dry season, with lower levels in the rainy season. Herbivorous fish predominated the diet for 47.2% of the women during the dry season, but this rose to 72.2% during the rainy season. Those who reported eating fish daily had higher mercury levels in hair compared to those who only ate fish a few times per week. Retrospective mercury analyses, evaluated by the quantity of mercury present in each centimeter of hair, indicate that mean mercury level of the population decreased over the 2 years prior to the study. The percentage of inorganic mercury over the total mercury in hair increased towards the extremities of the hair strand. Higher percentages of inorganic mercury were found for the group who ate more fish (on a daily consumption basis). These results support the assumption that there are seasonal variations in methylmercury exposure and also a relationship between type of fish species consumed and the resulting hair mercury levels
    corecore